On Lasso-type estimation for dynamical systems with small noise 
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Abstract 



We consider a dynamical system with small noise where the drift is parametrized by a 
finite dimensional parameter. For this model we consider minimum distance estimation from 
continuous time observations under some penalty imposed on the parameters in the spirit of the 
Lasso approach. This approach allows for simultaneous estimation and model selection for this 
' model. 

1 Introduction 
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Knight and Fu (2000) considered the linear regression model Y{ = xff3 + £j, with %i a vector of 
ly-j . covariates, (3 a vector of parameters and £j i.i.d. Gaussian random variables. They proposed to use 



the so called Bridge estimators (Frank and Friedman, 1993) (3 n solutions of 



O ■ $ n := argmin I ^(YJ - xjuf + A n £ \utf ) (1) 

\i=l j=l 

K<! 

for some 7 > and \ n — > as n — > 00. The estimators /3 n solution to ([T]) are a generalization of the 
Ridge estimators which correspond to the case 7 = 2. The usual Lasso-type estimators (Tibshirani, 
1996) are obtained setting 7 = 1. The estimators solutions to ([T]) are attractive because with them 
it is possible to perform estimation and model selection in a single step, i.e. the procedure do 
not need to estimate different models in first stage and compare them later with, e.g., information 
criteria (see e.g. Uchida-Yoshida, 2004.) Indeed, the dimension of the space of the parameters do 
no change, just some of the components of the vector /3| are assumed to be zero. As mentioned in 
Knight and Fu (2000), in the limit as 7 — > 0, this procedure approximate the AIC or BIC selection 
methods, i.e. 



p p 



j=i i=i 

with 1a the indicator function for set A. Efron et al. (2004) proved that, for some models, the 
solution of (pQ) has the same computational complexity of the standard OLS method, which makes 
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the approach appealing both from the theoretical and the computational point of view. In non- 
linear models a preliminary simple reparametrization (e.g. /3 h- > — (3) is needed to interpret this 
approach in terms of model selection. 

In this work, we consider a problem similar to ([T]) for the class of diffusion processes with small 
noise and replace the least squares approach with minimum distance estimation. The asymptotic 
properties of the minimum distance estimators in the i.i.d. framework have been established in 
Millar (1983, 1984). Later Kutoyants (1991, 1994) and Kutoyants and Philibossian (1994) studied 
in details the properties of such estimators for diffusion processes with small noise. This work 
relays mainly on Chapter 7 of Kutoyants (1994) and extends part of the results in it to the case 
of constrained parameters. The other basic reference for this paper is the work of Knight and Fu 
(2000) which contains the general idea on how to study the properties of constrained least squares 
estimators which we apply here to a class of minimum distance estimators of the drift of a dynamical 
systems with small noise. 



2 The Lasso-type problem for dynamical systems 



Let {Xt, < t < T} be a stochastic process solution of the following stochastic differential equation 

dX t = S t (e,X)dt + edW t (2) 

with non random initial condition Xq = xq. The parameter 9 E C BP are supposed to the 
unknown. Pg denotes the law induced by the process X when the true parameter is 9. We denote 
u = (tii, • • • ,u p ) the vector u G BP and the true value of 9 by 9*. Let || • || = || • ||l 2 ( m ) be the L2 
norm with respect to some measure // on [0, T], i.e. 

||/|| 2 = f f\xMdx) 
Jo 

Although the result in the present work generalizes to different metrics as explained in Kutoyants 
(1994), for simplicity we consider only the L2 type distance. We suppose that the trend coefficient 
in ([2]) is of integral type, i.e. 

S t {9, X) = V{9, t,X)+ [ K{9, t, s, X s )ds, 

Jo 

where the functions V(9,t,x) and K(9,t,s,x) are such that ([2]) has a strong unique solution. For 
example, the usual conditions (1-34) and (1.35) in Kutoyants (1994) about Lipschitz behaviour and 
linear growing are sufficient. The asymptotic in this model is considered as e — ► 0. 

We will also write x{9) = xt(9) to denote the limiting dynamical system satisfying the integro- 
differential equation 

dx t f l 
— = V(9, t, x t ) + / K(9,t,s,x s )ds, x - 
d* Jo 

Let = x^\9*) be the Gaussian process solution to 

dxi 1] = (v^^xtie*)^ + j K x (e ,t,s,x s (6*))xPds^dt + dW t , < t < T, (3) 
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x = , where V x (9, t, x) and K x (9, t, s, x) are the partial derivatives of V(9, t, x) and K(9, t, s, x) 
with respect to argument x. The process plays a central role in the definition of the asymptotic 
distribution of the estimators in the theory of dynamical systems with small noise. 

For this model we consider a constrained minimum distance estimator based on the following 
penalized contrast 

p 

Z e (u) = \\X -x(u)\\ 2 + X £ ^2\ U] \\ (4) 
7 > 0, u G and A e > a real sequence. 

In analogy to (P), we introduce the Lasso-type estimator for 9, defined as the solution of 

9 s = areminZ F (#). 

eee y ' 

As mentioned in the Introduction, the properties of non-penalized minimum distance estimators for 
model ^ have been proved in Section 7 in Kutoyants (1994), hence all the proof of this work, rely 
on the same techniques developed in the mentioned reference with adjustments for the penalization. 

We need in addition the following assumptions. 

Assumption 1. The stochastic process X is differentiable in e at the point e = in the following 
sense 

P e * - lim||e _1 (X- x) =0 
where x^ = {x^ , < t < T} is from ([3]). 

Assumption 2. The deterministic dynamical system xt{9) is differentiable in 9 at the point 9* in 
the following sense 

\\x(9* + h) - x(9*) - h T ■ x(9*))\\ = o{\h\) 

where h € BP . 

We further denote by xt{9) the p-dimensional vector of partial derivatives of Xt(9) with respect to 
9j, 3 = ,P- 

Assumption 3. The Fisher information matrix 

1(9)= F x t (9)xJ(e)dfM t 
Jo 

is positive definite. 

2.1 Consistency of the estimator 

Theorem 1. Let AssumptionsU\\3\hold and let e _1 A e — > Ao > as e — > 0. Then, 9 s is a consistent 
estimator of 9* . 
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Proof. Let 



and 



gL (v) = inf < 



|x(fl)-x(OII + A e X;i^r 



g e {v) = inf oL M 



where \9 — 9*\ > v is to be intended component wise, for real constant v. By definition of 9 s we 
have that 



Moreover, 



\u : \9 £ - 9*\ >v\ = \u: inf ZJ9) > inf ZJ0)\ 



Z £ {9) <\\X- x{9*)\\ + \\x{9) - x{9*)\\ + X e £ \0j 



p 



and 



Then 



Z £ (9) > \\x(9) - x(9*)\\ -\\X- x(9*)\\ + X £ J2 M 



inf \\x(9)-x(0*)\\ = 0. 

-9*\<u 



{\o £ 



»*| > i/) < P e * ( inf Z e (0) > inf ZJ9)) 

< Pe. (\\X - x(9*)\\ + A £ J>*P > fciy) -U- x(P)\\) 
V j= i / 

= p,, (V - > i{sg.(i/) - a £ £ n 1 }) ■ 

Take the supremum on f G [0, T] and the infimum for 9* 6 O and obtain 

P (|r - 0*| > „) < P ^ Q SUP T > - A £ mf E l^l 7 } 



< 2 exp < 



2 ^ 



8CTe 2 



> 0. 



In the above we made use of the Gronwell's Lemma: 

||X-s(0)|| 2 <Ce sup |W t |, 

0<t<T 

with C independent of 9 and e, and the following estimate 

pL ?T | W ',|>iv)<mm(2,l v /J) e -« 



see e.g. formula (1.49) in Kutoyants (1994). □ 

From the proof of the consistency of the estimator 9 £ is it clear that the speed of the convergence 
depends on the speed of \ e . The speed of X £ also affects the asymptotic distribution of the estimator. 



2.2 Asymptotic distribution of the estimator 

In order to study the asymptotic distribution of the Lasso-type estimator we need to distinguish 
the different cases for 7. We start with the case of large 7's. We denote by "— >d" the convergence 
in distribution and we denote by £ the following Gaussian random vector 

C= [ T x w Xt (e*Mdt). (5) 



J 

Theorem 2. Let Assumptions{l\{3\hold, £ defined as in ©, 7 > 1 and £ -1 A £ -> A > 0. Then 

e -i(£e _ 0*) ^ d argminy(u) 

u 

where 

p 

V{u) = -2u T ( + u T l(9*)u + A ^^sgn^*)!^- 1 

3=1 

for 7 > 1 and 



V(u) = -2u T ( + u T l(9*)u + X J2 (K|l{0*=o} + UjSgn(0*)\e*\l {e * m 
if 7 = 1. 

Proof. Let u £ R and introduce the quantity 

= 1 mix - x(r + eu )n 2 - iix - x(mi 2 + a £ £ {[0; + - i(?;r} j , (6) 



which is minimized at the point e 1 (9 £ — 9*) by definition of 9 £ . Then 



^{\\X-x(9* + eu)\\ 2 -\\X-x{9*)\\ 2 



^ {II* - x(9*) - eu T x{9*)\\ 2 -\\X- x(9*)\\ 2 } + o e (l) 



■ r ||i(r)|| 2 n-2n Tl1 "- 1 
u T l(9*)u - 2u T ( 

where £ is from ([5]). For the term 



u 1 \\±{9*)\\ 2 u - lu 1 \\e~\X - x{9*))±(9*)\\ + o £ (l) 

£^0 



A, " 

e 2 

3=1 



52{\e*+e Uj r-\e*r} 
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we have to distinguish the case 7 = 1 and 7 > 1. Let 7 > 1, then 
If 7 = 1, then by similar arguments, we have 

^{|0* + £% l _ |0*|} Ao £ (\ Uj \i {e]=0} + u jSg n(e*)\e*\i {e * m ) . 

3=1 i=i 
Notice that V £ (u) is convex in u since we can write the first term as 

u T \\±(9*)\\ 2 u - 2u T \\e~ 1 {X - x(9*))x{6*)\\ + o e (l) = u t k iU - 2u T k 2 + o e (l) 

with «i, some positive quantities and the second term is also trivially convex. Thus V £ (u) —> d 
V(u) due to the convexity of V £ , i.e. 

argmin V £ = e~ l {9 £ — 9*) — argmin V! 

Notice that the result on the convergence of using convexity dates back to Pollard (1991) and Geyer 
(1994, 1996), and a modern account can be found in Kato (2009). □ 

In the case < 7 < 1 the convexity argument cannot be applied, moreover, some rate of convergence 
is required to the sequence A e . 

Theorem 3. Let Assumptions^^ hold, Q defined as in ©, < 7 < 1 and A £ /e 1_7 — > Ao > 0. 
Then 

e -i(0£ _ q*) ^ d argmin y{u) 



where 



V{u) = -2u T ( + u T l{6*)u + A \ujPl { e*=o}- 

i=i 



Proof. As before we start with V £ (u) from ([6]). The first part of the expression in V £ (u) converges 
in distribution to —2u T ( > +u T I(9*)u as in Theorem [2j For the second term, we need to distinguish 
the two cases 6* = or 6* ^ 0. By assumptions we have that A e /e 1-7 — > Ao and hence necessarily 
X £ /e -> 0. So, if 8* / 0, we have that 



-Mi 



+ e Uj \-y- \9* 
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Conversely, if 9* = we have that 

A, 



3 

p 



^(|0* +£ „.|7_|0*|7)^ Ao ^ K |7 1{ 



i=i j=i 



So, as a whole, V £ — >d V{u). Following Kim and Pollard (1990), the final step consists in showing 
that argmin V £ = O p (l) and so argmin V £ -^d argmin^. Indeed, 

1 A p 

V £ (u) > -{\\X-x{6* + eu)\\ 2 - \\X - x(9*)\\ 2 ) -^J^| eUj |7 
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and for all u and e sufficiently small, 5 > 0, we have 

V £ (u) > 1 (||* - x(9* + eu)\\ 2 -\\X- x(9*)\\ 2 ) - (A + 6) £ KP = 

The term |«j | 7 grows slower than the the first normed terms in V^(u), so arg min ^/(u) = O p {\) and, 
in turn, argminV^(ii) is also O p (l). Since argminy(w) is unique, then the theorem is proved. □ 

Remark 1. Theorem [5] shows that, if 7 < f, one can estimate the nonzero parameters 9* 7^ at 
the usual rate without introducing asymptotic bias due to the penalization and, at the same time, 
shrink the estimates of the null 9* = parameters toward zero with positive probability. On the 
contrary, if j > 1 non zero parameters are estimated with some asymptotic bias if Ao > 0. This is 
a well known result in the literature. 
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